The source code and the full notebook (including all functions) for this article are available here.
Note: the html version of this notebook was created with nbconvert and running the command jupyter nbconvert --to html --TagRemovePreprocessor.remove_input_tags="notvisible" notebook.ipynb. A tag "notvisible" was added to the cells that are not displayed in this rendered html
This study follows the post and notebook published earlier here. In that post, we used Optuna to search the space of potential polynomial kernels with varying complexity and find an optimal kernel (automated kernel engineering) for the Gaussian Process model. In this post, we will use the same approach to train Gaussian Process regression models for a set of 24 benchmark functions, which are used to generate the training data. We will observe how increasing dataset sizes affects the performance of the trained models. The study focuses on 1D functions. The plan is to extend this study to more input dimensions.
Before digging deeper into the details, we show the summary of this study below.
In this section, we show how we generate data based on an example benchmark function, the dropwave function. This function along with several other functions are typically used to benchmark optimization algorithms. We utilized some of the benchmark functions that are available from SFU. These functions will be considered our true generative process function.
The function generate_data generates a grid of input values to sample from the dropwave function, whhich is defined as a lambda function. The grid is built using the ranges for the input variable X1, which are specified in the design space dictionary. The design space can also specify if we want to generate the data with noise. We will come back to this point later in this notebook. For now, we set the noise ('err') to 0.
# Design space
exp_space_dict = {'X1' : {'values' : [-2,2], 'val_type': 'float', 'cl': 'inp', 'err': 0},
'Z': {'values': [0, 1], 'val_type': 'float', 'cl': 'out', 'err': 0}}
# True generative process function
#sphere_b = lambda x_in: np.square(x_in).sum(axis=1).reshape(-1,1)
dropwave_b = lambda x_in: (-(1.0 + np.cos(12 * np.sqrt(np.square(x_in).sum(axis=1))))/((1.0/len(x_in))*np.square(x_in).sum(axis=1) + 2)).reshape(-1,1)
X, Z = generate_data(exp_space_dict, dropwave_b, num_samples = 1_000, random_state=1)
X_train, Z_train, X_noise_std, Z_noise_std = select_training_data(X, Z, exp_space_dict, dropwave_b, is_error=False, training_size=10, random_state=1)
The selected training dataset has 10 data rows.
# The dataframe with the measured observations, which will be used for training a model
train_df = pd.DataFrame(np.concatenate([X_train,Z_train], axis=1), columns=['X1','Z'])
print(f'Shape of the training set: {train_df.shape}')
train_df.head(3)
Shape of the training set: (10, 2)
| X1 | Z | |
|---|---|---|
| 0 | -1.259478 | -0.085640 |
| 1 | 0.867083 | -0.221497 |
| 2 | 1.623238 | -0.903030 |
The full generated dataset has 1000 data rows.
# This is the larger data frame holding about 1000 sampled observations
generated_df = pd.DataFrame(np.concatenate([X,Z], axis=1), columns=['X1','Z'])
print(f'Shape of the generated set: {generated_df.shape}')
generated_df.head(3)
Shape of the generated set: (1000, 2)
| X1 | Z | |
|---|---|---|
| 0 | -0.331912 | -0.166763 |
| 1 | 0.881298 | -0.296005 |
| 2 | -1.999543 | -0.708185 |
We can now fit a Gaussian Process regression model to the training data generated above using the Python library of scikit-learn for Gaussian Process regression gaussian_process. We let the optimizer find the best paramter values for the Gaussian Process regression model kernel. The optimizer will adapt the parameters to maximize the log marginal likelihood. We selected the RBF kernel for this exercise.
init_kernel = 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-5, 1e5))
gaussian_process, llh, kernel = fit_model(train_df, ['X1'], [], ['X1'], ['Z'], init_kernel, optimizer="fmin_l_bfgs_b")
# kernel optimized with LLH
print(f"kernel: {kernel}")
print(f"llh: {llh}")
# predictions for the whole set
mean_prediction, std_prediction = gaussian_process.predict(generated_df[['X1']], return_std=True)
mean_prediction = mean_prediction.reshape(-1,1)
std_prediction = std_prediction.reshape(-1,1)
mse_all = mean_squared_error(generated_df[['Z']], mean_prediction)
r2_all = r2_score(generated_df[['Z']], mean_prediction)
print(f'r2: {r2_all}')
print(f'mse: {mse_all}')
Fitting with kernel: 1**2 * RBF(length_scale=1). Training kernel - kernel will change kernel: 0.624**2 * RBF(length_scale=0.0998) llh: -8.899513414027057 r2: -0.13454569124267324 mse: 0.1357915901975852
As we can see from the model performance indicators and the graph below, 10 data points do not return a satisfactory model. Later, we will show how the performce changes by iteratively increasing the dataset size.
gp_plot = create_gp_regression_plot(X, Z, X_train, Z_train, X_noise_std, Z_noise_std, mean_prediction, std_prediction)
gp_plot
Given the training array 10 points, we generate 5 additional points on the input grid that satisy the maxmin property. This approach allows for sampling the generating true function where we lack data. The function _generatesamples generates many random data points on the specified grid before applying the minmax condition to select the additional data points.
# Generate new points from the true function based on maxmin
X_new = generate_samples(X_train, exp_space_dict, k_points=1)
Z_new = dropwave_b(X_new)
X_new = np.concatenate([X_train,X_new], axis=0)
Z_new = np.concatenate([Z_train,Z_new], axis=0)
train_df = pd.DataFrame(np.concatenate([X_new,Z_new], axis=1), columns=['X1','Z'])
# Fit the model without changing the kernel - no retraining
init_kernel = gaussian_process[-1].kernel_
gaussian_process, llh, kernel = fit_model(train_df, ['X1'], [], ['X1'], ['Z'], init_kernel, optimizer="fmin_l_bfgs_b")
# kernel optimized with LLH
print(f"kernel: {kernel}")
print(f"llh: {llh}")
# predictions for the whole set
mean_prediction, std_prediction = gaussian_process.predict(generated_df[['X1']], return_std=True)
mean_prediction = mean_prediction.reshape(-1,1)
std_prediction = std_prediction.reshape(-1,1)
mse_all = mean_squared_error(generated_df[['Z']], mean_prediction)
r2_all = r2_score(generated_df[['Z']], mean_prediction)
print(f'r2: {r2_all}')
print(f'mse: {mse_all}')
Generating 1000 in {'X1': {'values': [-2, 2], 'val_type': 'float', 'cl': 'inp', 'err': 0}, 'Z': {'values': [0, 1], 'val_type': 'float', 'cl': 'out', 'err': 0}}
Fitting with kernel: 0.624**2 * RBF(length_scale=0.0998). Training kernel - kernel will change
kernel: 0.618**2 * RBF(length_scale=0.104)
llh: -9.737947022962505
r2: 0.05962787737035358
mse: 0.11255132948369292
As we can see, adding one point improves the model performance. We can also notice that the point is added between -1 and 0, which was the region that was not sampled before.
gp_plot = create_gp_regression_plot(X, Z, X_new, Z_new, X_noise_std, Z_noise_std, mean_prediction, std_prediction)
gp_plot
To assess qualitatively the performance of trained Gaussian Process regression models using increasing dataset sizes, we set up sequential training based on different benchmark functions and increasing dataset sizes. For different benchmark functions, the underlying polynomial kernel for the GPR model will be selected using automated kernel engineering as described in a previous post here (the notebook is available here under the notebook folder).
We prepared a dataset that contains also the associated lambda functions we need to generate our training observations.
df_1D = pd.read_csv('../data/opti_functions_1D.csv')
df_1D = df_1D[df_1D.Select==1]
print(f'Shape of the benchmark function dataset: {df_1D.shape}')
df_1D.head(2)
Shape of the benchmark function dataset: (25, 9)
| Function Name | link | lambda | xrange | zmin | zmax | markdown | Select | kernel | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Bird-Like Function | /benchmarks/unconstrained/1-dimension/221-bird... | f = lambda x: (2*x**4 + x**2 + 2) / (x**4 + 1) | [-4, 4] | [2] | [-1, 1] | $f(x) = \frac{2x^4 + x^2 + 2}{x^4 + 1}$ | 1 | 0.191**2 * RBF(length_scale=0.269) + WhiteKern... |
| 1 | Gramacy-Lee's Function No.01 | /benchmarks/unconstrained/1-dimension/258-gram... | f = lambda x: np.sin(10 * np.pi * x) / (2 * x)... | [0.5, 2.5] | [0.548563444114526] | [] | $f(x) = \frac{\sin(10 \pi x)}{2x} + (x - 1)^4$ | 1 | 5.6**2 * RBF(length_scale=1.92) + WhiteKernel(... |
df_1D.loc[12]
Function Name Problem No.09 (Timonov's Function No.03 or Zil...
link /benchmarks/unconstrained/1-dimension/37-zilin...
lambda f = lambda x: -np.sin(x) - np.sin(2*x/3)
xrange [3.1, 20.4]
zmin []
zmax [17.039198942112002]
markdown $f(x) = -\sin(x) - \sin\left(\frac{2x}{3}\right)$
Select 1
kernel 1.43**2 * RBF(length_scale=0.608) + WhiteKerne...
Name: 12, dtype: object
| Function Name | link | lambda | xrange | zmin | zmax | markdown | Select | kernel | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Bird-Like Function | /benchmarks/unconstrained/1-dimension/221-bird... | f = lambda x: (2*x**4 + x**2 + 2) / (x**4 + 1) | [-4, 4] | [2] | [-1, 1] | $f(x) = \frac{2x^4 + x^2 + 2}{x^4 + 1}$ | 1 | |
| 1 | Gramacy-Lee's Function No.01 | /benchmarks/unconstrained/1-dimension/258-gram... | f = lambda x: np.sin(10 * np.pi * x) / (2 * x)... | [0.5, 2.5] | [0.548563444114526] | [] | $f(x) = \frac{\sin(10 \pi x)}{2x} + (x - 1)^4$ | 1 |
Sequential training starts with 10 initial generated data points and proceeds until it reaches 40 data points. If the performance of the model (measured with the R squared) stays above 0.98 for more than 5 subsequent added points, the iterative loop stops. For the polynomial kernel, we use the kernel functions that are available in sklearn: RBF, RationalQuadratic, ExpSineSquared, Matern, DotProduct. The polynomial kernel will have no more than 3 additive terms and each additive term will have no more than 2 product terms. The output of this cell is omitted.
save_path = '../data/imgs_1D'
output_catcher = StringIO()
for fi in df_1D.index:
if fi==5:
continue
#if fi>=2:
# continue
exec_scope = {}
f_str = str(df_1D.loc[fi,'Function Name'])
f_str = "_".join(f_str.replace("'",'').replace("-","_").replace(".","").replace(",","").replace("(","").replace(")","").split())
print(f'{fi} out of {max(df_1D.index)}, {f_str}')
lambda_str = str(df_1D.loc[fi,'lambda']).split('f = ')[1]
xrange_str = str(df_1D.loc[fi,'xrange']).replace('pi', 'np.pi')
code_to_exec = f'''import numpy as np\nf = {lambda_str}\nxrange = {xrange_str}'''
try:
with redirect_stdout(output_catcher):
exec(code_to_exec, exec_scope)
except Exception as e:
print(f"An error occurred: {e}")
f = exec_scope['f']
xrange = exec_scope['xrange']
benchmark_dict = {'X1' : {'values': xrange, 'val_type': 'float', 'cl': 'inp', 'err': 0.05},
'Z' : {'val_type': 'float', 'cl': 'out', 'err': 0.05}}
!rm ../images/*
res_dict = sequential_training(exp_space = benchmark_dict,
use_error = False,
generating_func=f,
init_training_size=10,
add_training_size = 1,
end_training_size = 40,
random_state = 1,
graph_path='../images')
res_plot = plot_results(res_dict)
pio.write_image(res_plot, f"{save_path}/graph_{str(f_str)}.png")
df_1D.loc[fi, 'kernel'] = str(res_dict['kernel'][-1])
!ffmpeg -framerate 2 -i ../images/graph_%d.png -vf "palettegen" ../images/palette.png
!ffmpeg -framerate 2 -i ../images/graph_%d.png -i ../images/palette.png -lavfi "paletteuse" ../images/movie.gif
!mv ../images/movie.gif {save_path}/movie_{str(f_str)}.gif
# Clear the output
clear_output(wait=True)
# remove output before converting to html
Function Name Bird-Like Function
link /benchmarks/unconstrained/1-dimension/221-bird...
lambda f = lambda x: (2*x**4 + x**2 + 2) / (x**4 + 1)
xrange [-4, 4]
zmin [2]
zmax [-1, 1]
markdown $f(x) = \frac{2x^4 + x^2 + 2}{x^4 + 1}$
Select 1
kernel
Name: 0, dtype: object
We end this notebook by showing how the uncertainty of the measurements of the input and target values can affect the performance of the GPR model. Depending on the magnitude of the uncertainty, achieving reasonable predictive performance for the trained model may require more iterations and/or measurements during sequential training. In fact, this is the situation that most commonly happens in scientific experimentation.
# absolute uncertainties are reported to avoid case when value is zero
exp_space_dict = {'X1' : {'values' : [-2,2], 'val_type': 'float', 'cl': 'inp', 'err': 0.05},
'Z': {'values': [0, 1], 'val_type': 'float', 'cl': 'out', 'err': 0.05}}
X, Z = generate_data(exp_space_dict, dropwave_b, num_samples = 1_000, random_state=1)
X_train, Z_train, X_noise_std, Z_noise_std = select_training_data(X, Z, exp_space_dict,
dropwave_b,
is_error=True,
training_size=30,
random_state=1)
train_df = pd.DataFrame(np.concatenate([X_train,Z_train], axis=1), columns=['X1','Z'])
generated_df = pd.DataFrame(np.concatenate([X,Z], axis=1), columns=['X1','Z']) # Note: remove training data from generated data
In the example, we add an uncertainty of 0.05 to both the input X1 and the output Z when generating the data. As we can see, even with 30 points the model shows a large uncertainty.
# fit initial model
init_kernel = 1 * RBF() + RBF()*WhiteKernel()
gaussian_process, llh, kernel = fit_model(train_df, ['X1'], [], ['X1'], ['Z'], init_kernel, optimizer="fmin_l_bfgs_b")
# kernel optimized with LLH
print(f"kernel: {kernel}")
print(f"llh: {llh}")
# predictions for the whole set
mean_prediction, std_prediction = gaussian_process.predict(generated_df[['X1']], return_std=True)
mean_prediction = mean_prediction.reshape(-1,1)
std_prediction = std_prediction.reshape(-1,1)
mse_all = mean_squared_error(generated_df[['Z']], mean_prediction)
r2_all = r2_score(generated_df[['Z']], mean_prediction)
print(f'r2: {r2_all}')
print(f'mse: {mse_all}')
gp_plot = create_gp_regression_plot(X, Z, X_train, Z_train, X_noise_std, Z_noise_std, mean_prediction, std_prediction)
Here, we considered 50 measurements and added a noise kernel term to fit our model.
Copyright (c) [2024] [Alessio Tamburro]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.